Integrated transcription and identification of named entities in broadcast speech

نویسندگان

  • Steve Renals
  • Yoshihiko Gotoh
چکیده

This paper presents an approach to integrating functions for both transcription and named entity (NE) identification into a large vocabulary continuous speech recognition system. It builds on NE tagged language modelling approach, which was recently applied for development of the statistical NE annotation system. We also present results for proper name identification experiment using the Hub-4 evaluation data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimedia interaction for the new millennium

Spoken language processing has created value in multiple application areas such as document transcription, data base entry, and command and control. Recently scientists have been focusing on a new class of application that promises on-demand access to multimedia information such as radio and broadcast news. In separate research, augmenting traditional graphical interfaces with additional modali...

متن کامل

Real-time rich-content transcription of Chinese broadcast news

This paper describes the recent development of an Audio Indexing System for Chinese (Mandarin) broadcast news. Key issues of the three major components: automatic speech recognition, speaker identification and named entity extraction are addressed. The Chinese-language-specific challenges are discussed and our solutions are described. The recognition accuracy of the final system is comparable t...

متن کامل

OOV Sensitive Named-Entity Recognition in Speech

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...

متن کامل

Robust Named Entity Extraction from Large Spoken Archives

Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods to the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration of the IE and ASR modules can increase the IE performance in more difficult conditions. More specifically thi...

متن کامل

Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants

In the Named Entity Recognition (NER) task, systems are required to recognize different types of Named Entities (NEs) in Italian texts. As in the previous editions of EVALITA, we distinguish four NE types: Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE). Participant systems should identify both the correct extension and type of each NE. The output of participan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999